Entropy correlation distance method applied to study correlations between the Gross 

Domestic Product of rich countries 
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The Theil index is much used in economy and finance; it looks like the Shannon entropy, but 
pertains to event values rather than to their probabilities. Any time series can be remapped through 
the Theil index. Correlation coefficients can be evaluated between the new time series, thereby 
allowing to study their mutual statistical distance, - to be contrasted to the usual correlation distance 
measure for the primary time series. As an example this entropy-like correlation distance method 
(ECDM) is applied to the Gross Domestic Product of 20 rich countries in order to test some economy 
globalization process. Hierarchical distances allow to construct (i) a linear network, (ii) a Locally 
Minimal Spanning Tree. The role of time averaging in finite size windows is illustrated and discussed. 
It is also shown that the mean distance between the most developed countries, was decreasing since 
1960 till 2000, - which we consider to be a proof of globalization of the economy for these countries. 
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1. Introduction 

The Theil 1 index, often used in economy and finance, is 
defined through 



(i) 



It served to measure the distribution of income Xi of i 
agents, among N agents, with respect to the average in- 
come (x), - the average being taken over the ensemble of 
incomes of the population of size TV. Th(x; N) spans the 
range till ln(AT). x is the vector of data (x\, . . . , xn)- 
It looks like the Shannon entropy but was invented to 
consider the event values themselves rather than their 
probability of occurrence. One peculiarity is that it mea- 
sures the agent's share relative to the mean (x) of the 
population. In terms of information theory, the Theil 
index measures the difference between the maximum en- 
tropy and its present value. An interesting development 
is to consider that the Xi quantity in Eq.{T]) is time de- 
pendent. Thus one can generalize the Theil index in or- 
der to remap a time series x(t) in a nonlinear way into 
a Th(t), as done in Sect. 2 which recalls considerations 
outlined in [Miskiewicz, 2008]. Thereafter from the Theil 
mapped series one can look at time dependent correla- 
tions between different data sets, distances, hierarchies, 
and other usual features, through various techniques of 
data analysis, like those leading to or resulting from net- 
work constructions. 



The first application is here below made to macroecon- 
omy time series, in particular to the GDP of 20 among 
the richest countries. Following up on studies of correla- 
tions between GDPs of rich countries [Miskiewicz & Aus- 
loos, 2005; Miskiewicz & Ausloos, 2007; Ausloos & Lam- 
biotte, 2007; Ausloos & Miskiewicz, 2008; Miskiewicz, 
2008; Miskiewicz & Ausloos, 2008; Ausloos & Gligor, 
2008; Gligor & Ausloos, 2008a; Gligor & Ausloos, 2008b], 
we have analyzed web-downloaded data on GDP 2 , used 
as individual wealth signatures of a country economical 
state ("status"). We have calculated the fluctuations of 
the Theil mapped GDP in different time windows and 
looked for correlations, and subsequent distances, as re- 
ported in Sect. 2. 

Usually, a complex system can be represented by a 
network, - nodes being scalar, i.e. agents, here the coun- 
tries, while links are weights, here measures of distances 
between two Th(t) taken from GDP fluctuation correla- 
tions between two countries. Indeed time series can be 
represented by networks [Yang & Yang, 2008]. In order 
to extract structures from the networks, we have also av- 
eraged the time correlations in different windows. This 
allows more robustness in the subsequent networks prop- 
erties and reveals evolving statistical distances. In line 
with our previous work we have examined two different 
(so called) networks. The results are presented in Sect. 3. 
A brief discussion on economy globalization follows with 
a conclusion in Sect. 4. It is found that such a measure 
of collective habits does fit the usual expectations defined 
by politicians or economists, i.e. common factors are to 
be searched for. 



1 H. Theil was a Dutch economctrician who was born on 13 Octo- 
ber 1924 in Amsterdam, graduated from the University of Ams- 
terdam, succeeded to Jan Tinbergen at the Erasmus University 
Rotterdam, moved and taught later in Chicago and at the Uni- 
versity of Florida. He died in 2000. 



2 from the Conference Board and Groningen Growth and De- 
velopment Centre, Total Economy Database, September 2008, 
http: / /www. conference-board.org/economics 



2. From macroeconomy index input to network 
construction 

2.1. GDP data 

GDP data sets of several among the most rich OECD 
countries were used for illustrating the method, i.e. 20 
countries: Austria (AT), Belgium (BE), Canada (CA), 
Denmark (DK), Finland (FI), France (FR), Greece (GR), 
Ireland (IR), Italy (IT), Japan (JP), the Netherlands 
(NL), Norway (NO), Portugal (PT), Spain (ES), Sweden 
(SE), Switzerland (CH), Turkey (TK), U.K. (UK), U.S.A 
(US), and Germany (DE), allowing for a linear superpo- 
sition of the data before the reunification in 1991 in the 
latter case; an "ALL" country is also invented as in pre- 
vious works [Miskiewicz & Ausloos, 2005; Miskiewicz & 
Ausloos, 2006; Miskiewicz & Ausloos, 2008; Miskiewicz, 
2008] as a sum of the GDP of considered countries. 3 Thus 
there are N — 21 time series to examine. 

The data are taken from Gronin- 
gen Growth and Development Centre 
(http://www.ggdc.net/index-dseries.html). The GDP's 
are presented in milions of 1990 US dollars (converted 
at Gery Khamis PPPs). In our case each time series 
starts in 1950 and finishes in 2007, such that there are 
58 data points in every time series. The evolution of 
a few cases are shown in Figs. HE] The GDP values 
range between 14 • 10 10 $ and 171 • 10 10 $ in the case of 
GR up to 1.4 • 10 12 $ and 9.4 • 10 12 $ for USA . Except 
some small "perturbations" the GDP of all presented 
countries is growing in time. Some deviation from the 
monotonical grow can be observed e.g. in the case of 
CH in 1973-1976 or in TK in 1998-2000. 

2.2. Mapping onto the Theil index 

The Theil index Th can be used to nonlinearly map an 
original time series A(t) into a new one through 



Th A (t,Ti) = 



■ In ■ 



(2) 



where the average {A)^ t Tl ^ is made over the ensemble of 
points j in a time window of size Xi, placed between t 
and t + Ti: 



^ t+Ti 

{A)(t,T l ) = Yl ll A 



J' 



(3) 



i.e. the Theil index is calculated for the interval [t, t+T\]. 
Here Ai(t) is the GDP of country i. 

Characteristics maximum Theil index values as a func- 
tion of the time window size are given in Table [T] These 
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FIG. 1: (right y-axis) Raw GDP time series for BE, FR, GR, 
IT in milions of US dollars, and (left y-axis) the resulting 
mapping into a Theil index for three time windows: T%= 5, 
10, 15 yrs. Years on the bottom x-axis correspond to the 
initial point of the time interval 



The set deviates somewhat from previous works [ Miskiewicz 
& Ausloos, 2006; Miskiewicz & Ausloos, 2008; Miskiewicz, 2008] 
since there is neither Iceland nor Luxembourg but there is Turkey 
in the present paper. 
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FIG. 2: (right y-axis) Raw GDP time series for CH, TK, UK, 
US in milions of US dollars, and (left y-axis) the resulting 
mapping into a Theil index for three time windows: Ti= 5, 
10, 15 yrs. Years on the bottom x-axis correspond to the 
initial point of the time interval 
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TABLE I: Maximum value of the Theil index (TO 5 ) calculated 
during the time interval of interest, for three time windows 
T\—5, 10, 15 yrs. The countries are ranked in decreasing Th 
value. US, CA, and JP have been extracted from the list and 
placed at the end of the data display in order to emphasize 
the European hierarchy. 



Th values vary between and 0.05. Recall that Th is 
if all Ai = (A)( t Tl j and is maximum if for one i, 
A, = AT(A)( t)Tl ) and all other Aj =0. In Table U we 
have emphasized the 17 European countries which we ex- 
amined. It is remarkable that there are a few but weak 
variations in the ranking, as a function of the window 
size. UK has always the smallest Th. The Scandinavian 
countries come next, followed by the western countries, 
and finally the mediterranean ones. But DE is always 
the last one having a large Th. It is remarkable that 
IR is a ... mediterranean country. This class might be 
rather labelled maritime. Finally, let us observe that Th 
for T\ = 15 yrs seems to lead to the most intuitive (ge- 
ography and economy based) grouping. 

For illustration, we give in Figs. [l][2j the Theil index 
mapping of a few significant GDP time series, in particu- 
lar for the following countries, in (BE, FR, GR, IT) and 
outside (CH, TK, UK, US) the EUR zone and for a few 
Ti time windows, i.e. 5, 10, 15 yrs . 

The GDP Theil index of the presented countries for 
the medium (Ti = 10 yrs) and long (Ti = 15 yrs) size 
time windows has a maximum at ca. 1960. In the case 
of the shortest time window T\ = 5 yrs the time evolu- 
tion of the GDP Theil index remains on the same level 
without any visually meaningful extremal point, which 
makes such a behaviour difficult to analyse beyond a triv- 
ial statement. 

A few observations can be made. In the case of BE and 
FR, on one hand, and CH, on the other hand, the Theil 
index is decreasing from 1960 till 1970 and remains on 
a stable level thereafter (specially for the long time win- 
dow) . The Theil index of US besides the main maximum 
at 1960 for the medium size time window (T\ = 10 yrs) 
has two other local maxima, i.e. at 1980 and 1990. How- 
ever for the longest presented time window (Ti = 15 yrs) 
only the main maximum at 1960 can be distinguished; af- 
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ter 1970 a relatively stable value of GDP Theil index can 
be observed, - as was pointed out, the same as in BE, FR 
and CH. The Theil index of TK has its main maximum 
ca. 1970, which is followed by a decrease till a minimum 
at 1975 and an increase until a maximum at 1980. 

For the last few data points the Theil index of TK 
is increasing. A similar situation can be observed in 
the GR case, - the Theil index is increasing since 1985. 
IT seems to have more pronounced oscillations than the 
three other illustrating EUR countries, but the main 
maximum seems to occur much earlier that for the other 
countries. 

The Theil index of UK for the medium size time win- 
dow (Ti = 10 yrs) has two pronounced maxima, one at 
1960 and the second at 1980; an analogous behaviour can 
be made for the 15 yrs time window, but the first maxi- 
mum is at 1955 and the second 1965. Finally US seems 
after the pronounced maximum in the early part of the 
data have reached a stable level on which are superposed 
marked oscillations. 

2.3. Time series distances 

In order to compare time series, one can measure char- 
acteristic features, like their Hurst exponent, their (mul- 
tifractal and power) spectrum, .... or their relative dis- 
tance. Several definitions of distances can be found in the 
literature. The distance between two time series (here 
the Theil-mapped time series) is hereby defined as the 
absolute value of the difference between mean values in 
the interval [t,t + T?\. For further considerations to be 
explained below, in Sec. 4, one could also consider non- 
equal time correlations, thus taking into account a time 
lag r. Therefore we define 

d Th (A,B) (ttTuT2tT) = KT^TO-T/^+t,^)}^)!. 

(4) 

In Eq.Q the mean value denoted by brackets ((...)) is 
defined as in Eq.Q. Such a mean value can be taken on 
a time window T2 different from T\. In the present paper 
we will only report and discuss data when r = 0. As a 
result we have two time parameters: 

1. the Ti remapping time window while calculating 
the Th index and 

2. the correlation window T2. 

2.4. Network construction 

The distance between nodes matrices obtained from 
Eq.Q are here below analysed after constructing two 
network structures and measuring their statistical prop- 
erties. The following networks are considered: (i) the 
unidirectional minimal length path (UMLP) and (ii) the 
locally minimal spanning tree (LMST). The algorithms 
generating the mentioned networks are: 

UMLP The network begins with an arbitrary chosen 
country, - here the ALL country, then the closest 
neighbouring country is attached and become the 



new end of the network. The next country clos- 
est to this end of the network is searched and at- 
tached. The process continued until all countries 
are attached. 

LMST The root of the network is the pair of closest 
neighbouring countries. Then the country closest 
to any node is searched and attached. The algo- 
rithm is continued until all countries are attached 
to the network. The "ALL" country is not used in 
the network construction. 

We emphasize that in the UMLP construction, ALL is 
at the beginning of the chain, while in the other two con- 
structions, ALL is treated as an ordinary country. The 
LMST network seed is the appropriate pair of the closest 
countries according to the appropriate distance matrix. 
The first network is linear, and essentially robust against 
a perturbation, like removing or adding a country or in 
the case of a regrettable mathematical error, since it is 
based on a measure relative to a statistical mean, while 
the LMST is obviously a tree, - rather compact when 
only 21 data points, thus with very few branching levels, 
are involved in the construction. It is known that such a 
tree is far from robust. 

Since two time windows (Theil mapping and correla- 
tion measure) are used simultanousely the total size of 
the time window is equal to the sum of the time win- 
dows. In our analysis UMLP and LMST networks were 
constructed for all time windows ranging from T\ = 5 
to 58 yrs , Ti = 1 y to 58 yrs, moving along the time 
axis by a one year step. Therefore the number of the 
generated UMLP and LMST networks is equal to the 
time series length minus the total time window size, i.e. 
the T\ and T 2 parameters must satisfy the inequality 
Ti + T2 < 58 yrs, whence the number of generated net- 
works (Net) depends on the time window sizes and is 
equal to v = 58 -T x - T 2 . 

3. Results 

3.1. Theil index distance statistics 

In total there is a huge number of networks. Therefore 
some cases are to be extracted for the present report. 4 
We propose a visualisation of the data through a spectro- 
gram method, using for the x and y axis respectively the 
time window T2 and Ti . The data values are represented 
by a grey pixel in a convenient order as indicated in the 
figures Ellll 

The mean value and the standard deviation of the dis- 
tances between nodes as a function of the T\ and T2 are 
presented in Figs. [3] -[4] for UMLP and LMST cases re- 
spectively. The largest value of the mean distance, the 
minimum mean distance, the maximum and minimum 



4 All cases are available from the authors upon request. 
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TABLE II: Upper part of the Table: Maximum and minimum 
mean Theil distance value of each type of network UMLP 
and LMST. The mean value is calculated over the distances 
between nodes on the network and the ensemble generated 
for the given time windows paremeters. The values of the 
averaging windows Ti and T 2 when this maximum (minimum) 
occurs are indicated; the corresponding number of networks 
v is indicated. Bottom part of the Table (std) indicates the 
maximum and minimum values of the std for the parameter 
cases so indicated. 



standard deviations as a function of the time windows 
T\ , T 2 are presented in Table [TTJ 

It can be first generally observed that the mean dis- 
tance between countries and the corresponding standard 
deviation are the biggest for UMLP networks and the 
smallest for LMST networks. The maximum of the mean 
distance occurs for the longest T\ and the shortest T 2 
windows sizes. The minimum mean distance is found 
with the opposite combination of the time windows sizes, 
i.e. small T\ and large Ti. Again let it be emphasized 
that the (max or min) mean values do NOT NECESSAR- 
ILY occur at the (max or min) standard deviations. 





FIG. 3: (a) Mean distance and (b) standard deviation of the 
distance distribution between countries in a UMLP networks 
as a function of the T\ and T 2 time window sizes. The distance 
and standard deviation result from averaging over the network 
links and networks generated in the moving time window. 



3.2. Theil network evolution 

For further discussion the following time window size 
combinations were selected, i.e. (Ti = 5 yrs, Ti = 10 
yrs), (Tr = 10 yrs, T 2 = 5 yrs), (Ti = 10 yrs, T 2 = 10 
yrs), (Ti = 15 yrs, T 2 = 15 yrs). The evolutions of the 
mean distance between countries are presented in Figs. 
[5]|ni Straight lines indicate visually remarkable features. 

The general observations to be made at this stage are 
the following ones: 

• In all considered networks (UMLP and LMST) and 
for all window sizes three types of evolution can 
be distinguished: increase, decrease and relatively 
stable mean distances between countries. 

• The ratio max/min of the mean networks size for 
the considered time windows span between 6 and 
13. 

• It is worth noticing that for time windows [(Ti = 5 
yrs, Ti = 10 yrs), (Ti = 10 yrs, T 2 = 5 yrs), (Ti = 
10 yrs, Ti — 10 yrs)] the maximum of the mean 
distance occurs at about 1960, and 

• since then the size of the network(s) is fast decreas- 
ing over a decade up to 1970. 



• Thereafter the mean distance remains small and 
relatively stable up to 2000 or so. 

• The mean size is reincreasing after 2000. 
4. Conclusions 

In conclusion, the most interesting results of this analysis 
are 

• The Th index values are quite small indicating a 
rather homogeneous set of values for the GDP cen- 
tered around the mean. 

• UK has the lowest Th index, indicating the most 
stable development. 

• ES and DE have the largest Th index which might 
be a result of political perturbations of this latter 
country in the investigated time interval. 

• Th values can be surprisingly grouped according to 
climatic rather than geographical regions.. 

• Long time window size of Theil index and short 
correlation window size results in bigger network 
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FIG. 4: (a) Mean distance and (b) standard deviation of the 
distance distribution between countries in a LMST networks 
as a function of the T\ and T2 time window sizes. The distance 
is averaged over the network links and networks generated in 
the moving time window. 



size and higher standard deviation of the distance 
distribution. Such network might be suitable for 
clique formation analysis. 

• The 15 yrs time window seems to give the most 
coherence from the intuitive grouping point of view. 

• A low value of Th does not necessarily mean that 
the network size decreases. 

• The mean distance between countries and the cor- 
responding std are the largest for the UMLP net- 
works and the smallest for the corresponding LMST 
networks. 

• The analysis shows the existence of a globalization 
process since 1960 till 1970 and its stabilisation 
thereafter, followed by a destabilisation after 2000 
as observed in the decrease of the network size. 

• The observation of the globalization process does 
not depend on the type of network constructed. 

A word is in order concerning the time lag r which 
could be introduced in the analysis. See Eq. (4). The 
time lag leads to an asymmetry in the correlation be- 
tween the fluctuations. This induces us to suggest further 
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FIG. 5: Yearly evolution of the mean value of the links ^dis- 
tances) between nodes for the UMLP network deduced from 
Theil mapping analysis of GDP of the 20 examined countries. 
The time window sizes are given above every plot. 



studies which could lead to conclude on deducing a set 
of leaders and followers. We have observed in another 
though related analysis [Ausloos & Miskicwicz, 2008] 
that for increasing time lag the mean distance between 
network nodes increases, whence magnifying details of 
evolution. 

Finally let us stress the interest of studying graphs, in 
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FIG. 6: Yearly evolution of the mean value of the links ^dis- 
tances) between nodes for the LMST network deduced from 
Theil mapping analysis of GDP of the 20 examined countries. 
The time window sizes are given above every plot. 
5 This paper seems to reproduce some considerations from [Aus- 
loos & Miskiewicz, 2008] . We are aware that the presentation of 
results based on the same similar data set, through two differ- 
ent mappings might induce some confusion in the reader. Our 
considerations in this paper corresponds indeed to taking q = 1 
in [Ausloos & Miskiewicz, 2008]. It should be emphasized that 



particular to derive weighted networks such as in this pa- 
per, in order to have some comparative data organisation 
coherence. 5 
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